gravicom

Andrea Kaplan
March 27, 2014

A web-based tool for community detection in networks

Outline

  • Introduction/Background
  • User Interface
  • Demo
  • Graphical Devices
  • Technical Aspects
  • Further Work

Introduction

Who cares and why did we make this?

Networks

  • Many relationships easily conceptualized as a graph/network
  • A graph is defined as a collection of nodes (entities) and edges (relationships)
  • Examples of such relationships include:
    • social networks (sociology)
    • the world wide web (computer science)
    • protein networks (biology)

Network Examples

Social network of a karate club. Internet map 1024 - transparent The protein interaction network of Treponema pallidum

Community Detection

  • A community is defined as a group of nodes in a graph that share properties
  • Community structure - collection of nodes which are densely linked to nodes within the community and sparsely linked to notes outside
  • Current methodology for community detection often involves an algorithmic approach; partitions a graph into node clusters iteratively before stopping criterion
  • First define an objective function and then optimize

Community Detection (Cont'd)

Example objective function: Girvan & Newman’s modularity measure

\[ Q = \sum\limits_r (e_{rr} - a_r^2) \] \( r \) = a community
\( e_{rr} \) = fraction of links that connect two nodes inside the community
\( a_r^2 \) = the fraction of links that have one or both vertices inside the community

The Problem

  • Search for an optimal modularity value is NP-hard
  • The number of possible partitions of the network requires \( 2^n \) complexity
  • In fact, the optimization of an objective function is typically an NP-hard problem

Current Solutions

  • Heuristics used to optimize the objective function in a reasonable amount of time
  • Heuristic-based clustering is useful because this offers an automated way to perform community detection
The main elements of the problem themselves [graph clustering], i.e. the concepts of community and partition, are not rigorously defined, and require some degree of arbitrariness and/or common sense. (Fortunato, 2010)


  • Heuristics are not the solution.

Leverage the Human Visual System

  • Communities are often fuzzily-defined human concepts
  • Address this by adding element of human judgement
  • Introduce a novel visualization-based community detection tool, gravicom

gravicom Functionality

  • Allows users to
    • Visually direct and interact with the steps of community detection
    • Assess the created clusters through visual and quantitative summaries
  • Standalone exploratory tool for graph data
  • Initial state to be passed to a community detection algorithm in order reduce the complexity of optimization

User Interface

Meet gravicom

Components

Site Components (1) Control Panel, (2) Data Management, (3) Connection table, (4) Graph Display, and (5) Tabset

Demo

http://shiny1.iastate.edu/andeek/gravicom
(must be VPNed to internal ISU network)

Football Example

Site Components

Football Example (Cont'd)

Conference Teams Identified Accuracy
ACC Duke, Wake Forest, Virginia, Florida State, Clemson, North Carolina, Maryland, Georgia Tech, North Carolina State 100%
Big 10 Ohio State, Penn State, Michigan, Michigan State, Purdue, Minnesota, Northwestern, Illinois, Iowa, Wisconsin, Indiana 100%
Big 12 Kansas State, Iowa State, Kansas, Texas A& M, Texas Tech, Baylor, Missouri, Texas, Oklahoma State, Colorado, Oklahoma, Nebraska 100%
C-USA Cincinnati, Louisville, Houston, Tulane, Southern Mississippi, Army, Memphis, East Carolina, Alabama Birmingham 100%
Independent Notre Dame, Navy 100%
Pac-10 Arizona, Oregon State, Washington, Washington State, Arizona State, UC LA, Stanford, Southern California, Oregon, California 100%

Football Example (Cont'd)

Conference Teams Identified Accuracy
SEC Vanderbilt, Florida, Louisiana State, South Carolina, Mississippi, Arkansas, Auburn, Kentucky, Georgia, Mississippi State, Alabama, Tennessee 100%
MAC Central Florida, Western Michigan, Miami Ohio, Ohio, Bowling Green State, Marshall, Ball State, Akron, Buffalo, Northern Illinois, Eastern Michigan, Toledo, Central Michigan, Kent 92.9%
Big East Boston College, Miami Florida, Virginia Tech, Syracuse, Temple, West Virginia, Connecticut, Pittsburgh, Rutgers 88.9%
WAC Nevada, Fresno State, Texas Christian, Tulsa, Hawaii, Rice, Southern Methodist, San Jose State, Texas El Paso 88.9%
Big West Middle Tennessee State, Louisiana Lafayette, Louisiana Monroe, Louisiana Tech 75%
Mountain West Brigham Young, San Diego State, Boise State, Wyoming, New Mexico, Nevada Las Vegas, Utah, North Texas, Utah State, New Mexico State, Colorado State, Arkansas State, Idaho, Air Force 57.1%
  • Through manual specification of conferences, we were able to correctly classify 91.3 % of the football teams into their conferences.

Graphical Devices

Theory behind the curtain

Importance of Graph Layout

  • Graph layout is an assignment of a Cartesian coordinate to each node for display in 2D or 3D
  • Layout of a graph significantly affects the number of communities that users detect within a graph
  • Humans used to detect communities \( \Rightarrow \) special attention needs to be paid to the layout being used
  • Location of a node spatially relative to other nodes in a cluster has a significant effect on user: “adjacent nodes must be placed near to each other if possible”

Force-Directed Layout

  • Physics based algorithm for graph drawing
  • Repulsive forces (charged particles) used to separate all pairs of nodes
  • Links are fixed-distance geometric constraints
  • Groups of nodes sharing multiple edges are pulled in closer proximity
  • Pseudo-gravity force keeps nodes centered in the visible area and avoids expulsion of disconnected subgraphs

Graph Simplification

  • Difficult to glean meaning from a visual representation in large/complex graphs
  • Replace repeated patterns in a graph by a representation, to simplify a network
  • Fewer nodes and edges to display \( \Rightarrow \) visual complexity of the graph visualization is greatly reduced
  • Allows the user to analyze the network structure more accurately

Technical Aspects

What makes it tick?

Page Lifecycle

Integrated Algorithm

Tools

  • Shiny: Server-client interaction
  • D3: User interface and graph layout
  • igraph: Data formatting

Tools (Cont'd)

Integrated Algorithm

Data Formatting

GML file structure:

graph
[
  directed 0
  node
  [
    id 0
    label "Node 1"
    value 100
  ]
  node
  [
    id 1
    label "Node 2"
    value 200
  ]
  edge
  [
    source 1
    target 0
  ]
]

JSON file structure:

{
  "nodes":
  [{"id":"n0","v_id":"0","v_label":"Node 1","v_value":"100"}, 
   {"id":"n1","v_id":"1","v_label":"Node 2","v_value":"200"}], 
 "edges":
  [{"source":0, "target":1}]
}

Further Work

Possible extensions to gravicom

Integrated Algorithmic Community Detection

  • Combine the benefits of human detection of communities with algorithmic detection
  • Visual detection of communities serves as an initialization step
  • Pass to iterative algorithm
  • User tracks progress and has power to dynamically set stopping criterion

How would it work?

Integrated Algorithm

Dynamic Temporal Graph Visualization

  • View a dynamic graph across time – how the edges change between nodes
  • Detect time-dependent communities
  • Add optional node labels to track progress

Questions?

Thank you!